Swamp: an Isometric Frontend for Speaker Clustering

نویسنده

Patrick Nguyen

چکیده

In this paper, we describe a non-linear feature normalization based on Riemannian differential geometry. This feature normalization will yield parameters that are invariant under any bijective stationary transformation. Moreover, it is robust to additive noise that is uncorrelated with speech and quasi-stationary. The only requirement is that of ergodicity. The frontend is called SWAMP (Sweeping Metric Parameterization). The frontend assumes that speech resides in a small, smooth manifold that is entirely and densely explored during the course of an utterance. It first observes the tangent spaces on every point of the manifold. This defines a local Riemannian geometry. Under this geometry, we are able to measure geodesic lengths on the manifold. These lengths are invariant under non-linear transformations. Therefore, we are able to locate a point invariantly by measuring its relative distance to all other observed points. Through classical multi-dimensional scaling, we map this triangulation to a canonical, Euclidean, isometric space inherent of the observed manifold. Combined with standard features, SWAMP features are shown to improve speaker clustering on Broadcast News.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of gender normalization using MLP and VTLN features

This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare ...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

E-HMM approach for learning and adapting sound models for speaker indexing

This paper presents an iterative process for blind speaker indexing based on a HMM. This process detects and adds speakers one after the other to the evolutive HMM (E-HMM). The use of this HMM approach takes advantage of the different components of AMIRAL automatic speaker recognition system (ASR system: frontend processing, learning, loglikelihood ratio computing) from LIA. The proposed soluti...

متن کامل

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Conventional Speaker Identification (SI) systems utilise spectral features like Mel-Frequency Cepstral Coefficients (MFCC) or Perceptual Linear Prediction (PLP) as a frontend module. Line Spectral pairs Frequencies (LSF) are popular alternative representation of Linear Prediction Coefficients (LPC). In this paper, an investigation is carried out to extract LSF from perceptually modified speech....

متن کامل

Voting for two speaker segmentation

The process of locating the end points of each speakers voice in an audio file and then clustering segments based in speaker identity is called speaker segmentation. In this paper we present a method for two speaker segmentation, though it can be extended to more than two speakers. Most methods for speaker segmentation and clustering start with an initial computationally inexpensive speaker seg...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Swamp: an Isometric Frontend for Speaker Clustering

نویسنده

چکیده

منابع مشابه

Analysis of gender normalization using MLP and VTLN features

Bilateral Weighted Fuzzy C-Means Clustering

E-HMM approach for learning and adapting sound models for speaker indexing

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Voting for two speaker segmentation

عنوان ژورنال:

اشتراک گذاری